A Broad Evaluation of Techniques for Automatic Acquisition of Multiword Expressions

نویسندگان

  • Carlos Ramisch
  • Vitor De Araujo
  • Aline Villavicencio
چکیده

Several approaches have been proposed for the automatic acquisition of multiword expressions from corpora. However, there is no agreement about which of them presents the best cost-benefit ratio, as they have been evaluated on distinct datasets and/or languages. To address this issue, we investigate these techniques analysing the following dimensions: expression type (compound nouns, phrasal verbs), language (English, French) and corpus size. Results show that these techniques tend to extract similar candidate lists with high recall (∼ 80%) for nominals and high precision (∼ 70%) for verbals. The use of association measures for candidate filtering is useful but some of them are more onerous and not significantly better than raw counts. We finish with an evaluation of flexibility and an indication of which technique is recommended for each languagetype-size context.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Generic Framework for Multiword Expressions Treatment: from Acquisition to Applications

This paper presents an open and flexible methodological framework for the automatic acquisition of multiword expressions (MWEs) from monolingual textual corpora. This research is motivated by the importance of MWEs for NLP applications. After briefly presenting the modules of the framework, the paper reports extrinsic evaluation results considering two applications: computer-aided lexicography ...

متن کامل

Une plate-forme générique et ouverte pour le traitement des expressions polylexicales (An Open and Generic Framework for the Acquisition of Multiword Expressions) [in French]

An Open and Generic Framework for the Acquisition of Multiword Expressions In this paper, we present and evaluate an open and flexible methodological framework for the automatic acquisition of multiword expressions (MWEs) from monolingual textual corpora. We start with a pratical motivation followed by a theoretical discussion of the behaviour and of the challenges that MWEs pose for NLP applic...

متن کامل

Alignment-based extraction of multiword expressions

Due to idiosyncrasies in their syntax, semantics or frequency, Multiword Expressions (MWEs) have received special attention from the NLP community, as the methods and techniques developed for the treatment of simplex words are not necessarily suitable for them. This is certainly the case for the automatic acquisition of MWEs from corpora. A lot of effort has been directed to the task of automat...

متن کامل

Multiword Lexical Acquisition And Dictionary Formalization

In this paper, we present the current state of development of a large-scale lexicon built at LabEL1 for Portuguese. We will concentrate on multiword expressions (MWE), particularly on multiword nouns, (i) illustrating their most relevant morphological features, and (ii) pointing out the methods and techniques adopted to generate the inflected forms from lemmas. Moreover, we describe a corpus-ba...

متن کامل

Automated error analysis for multiword expressions: Using BLEU-type scores for automatic discovery of potential translation errors

We describe the results of a research project aimed at automatic detection of MT errors using state-of-the-art MT evaluation metrics, such as BLEU. Currently, these automated metrics give only a general indication of translation quality at the corpus level and cannot be used directly for identifying gaps in the coverage of MT systems. Our methodology uses automatic detection of frequent multiwo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012